Skip to content

CBoms #51

Merged
morzan1001 merged 63 commits intomainfrom
feature/cbom-phase1
Apr 29, 2026
Merged

CBoms #51
morzan1001 merged 63 commits intomainfrom
feature/cbom-phase1

Conversation

@morzan1001
Copy link
Copy Markdown
Owner

Since there was a request for this in #48 , I've started making crypto dependencies and settings trackable via CBOMs. This pull request contains the first backend changes; a second one will follow.

Add crypto_policy service package with four YAML seed files (NIST SP 800-131A,
BSI TR-02102, CNSA 2.0, NIST PQC) and a seeder that idempotently upserts the
system policy only when the stored version is below CURRENT_SEED_VERSION.
Add CryptoAssetRepository.ensure_indexes(), CryptoPolicyRepository.ensure_indexes(),
and seed_crypto_policies() calls to the application startup handler so that
crypto collection indexes and the built-in seed policy are ready on boot.
Implements Task 3.1 of CBOM Phase 1 plan with three read-only endpoints:
- GET /api/v1/projects/{project_id}/crypto-assets: list with pagination and filtering
- GET /api/v1/projects/{project_id}/crypto-assets/{asset_id}: get single asset
- GET /api/v1/projects/{project_id}/scans/{scan_id}/crypto-assets/summary: aggregate by type

All endpoints require project membership and return 403 if user lacks access.
Pagination limit capped at 500 to prevent resource exhaustion. Filtering supports
asset_type, primitive, and name_search (case-insensitive regex).

Test coverage includes repository integration tests verifying pagination, filtering,
and summary aggregation logic.
morzan1001 and others added 29 commits April 21, 2026 16:35
* feat: add Phase 2 crypto finding types (cert lifecycle + weak protocol)

* feat: add expiry-thresholds and cipher-weakness fields to CryptoRule

* feat: add IANA TLS cipher-suite catalog + loader

* feat: seed cert-lifecycle + protocol-cipher default rules, bump seed version

* feat: add CertificateLifecycleAnalyzer with 7 checks

* feat: add ProtocolCipherSuiteAnalyzer using IANA catalog

* test: include Phase-2 types in crypto-values set for collision test

* feat: register Phase-2 analyzers in registry

* test: integration tests for Phase-2 cert + cipher analyzers

* feat: add analytics response schemas (HotspotEntry, TrendSeries, ScanDelta)

* feat: add analytics:global permission to admin preset

* feat: add analytics scope resolver with project/team/global permission gating

* feat: add TTLCache for analytics query results

* feat: denormalize scan_created_at into findings with backfill migration

* feat: add CryptoHotspotService with 5-dimensional grouping

* feat: add CryptoTrendService with 8 time-bucketed metrics

* feat: add scan-delta service keyed on (name, variant, primitive)

* feat: expose /api/v1/analytics/crypto endpoints (hotspots, trends, scan-delta)

* perf: add indexes supporting analytics aggregation queries

* feat(frontend): add crypto-analytics types and API client

* feat(frontend): add AnalyticsViewSwitcher with URL-synced state

* feat(frontend): add HotspotTable primary view

* feat(frontend): add HotspotHeatmap view

* feat(frontend): add HotspotTreemap and HotspotBarChart views

* feat(frontend): add TrendsTimeSeriesChart with metric label map

* feat(frontend): add ScanDeltaView + ProjectScans delta trigger

* feat(frontend): add CryptoHotspotsPage with view switcher

* feat(frontend): add CryptoTrendsPage and wire sub-tabs into CryptographyTab

* feat(frontend): add CrossProjectNetworkView for team/global scopes

* feat(frontend): add team crypto-analytics sub-tab

* feat(frontend): add admin global crypto analytics + Metabase deep-link

* feat: add MCP tools for crypto hotspots, trends, and scan-delta

* docs: document VITE_METABASE_CRYPTO_DASHBOARD_URL env var

* revert: remove admin + team crypto analytics pages and Metabase references

Crypto analytics stay project-scoped in the Cryptography tab.
The admin /settings/crypto-analytics page, the team-analytics sub-tab in
TeamMembersDialog, and the CrossProjectNetworkView component are removed.
All Metabase deep-link references are stripped (no such dashboard exists).

Backend analytics endpoints for team/global scopes remain — they serve
MCP tool calls and are ready for future UIs if needed.

* refactor: move crypto analytics into main /analytics page

Removes the Cryptography top-level tab from ProjectDetails. All crypto
analysis views (Hotspots, Trends, Inventory, Findings) now live in the
shared /analytics page as a new Cryptography tab, matching the pattern
of VulnerabilityHotspots and other cross-project analyses.

Backend: adds a 'user' scope to ScopeResolver that resolves to all
projects the current user has access to (no elevated permission needed).

Per-project 'Crypto Policy' tab on ProjectDetails is preserved - that
is configuration, not analytics.

* fix: include 'user' in crypto analytics scope regex

* fix(tests): correct $sum accumulator init in _FakeDb aggregate helper

* chore(frontend): remove dead CryptoAssetTable component

* refactor(frontend): simplify crypto tab permission check to analytics:read

* refactor(frontend): drop unused 'network' variant from AnalyticsView

* fix: extend 'user' scope support to HotspotResponse schema and _FakeCollection.find projection arg
* feat: add Phase 2 crypto finding types (cert lifecycle + weak protocol)

* feat: add expiry-thresholds and cipher-weakness fields to CryptoRule

* feat: add IANA TLS cipher-suite catalog + loader

* feat: seed cert-lifecycle + protocol-cipher default rules, bump seed version

* feat: add CertificateLifecycleAnalyzer with 7 checks

* feat: add ProtocolCipherSuiteAnalyzer using IANA catalog

* test: include Phase-2 types in crypto-values set for collision test

* feat: register Phase-2 analyzers in registry

* test: integration tests for Phase-2 cert + cipher analyzers

* feat: add analytics response schemas (HotspotEntry, TrendSeries, ScanDelta)

* feat: add analytics:global permission to admin preset

* feat: add analytics scope resolver with project/team/global permission gating

* feat: add TTLCache for analytics query results

* feat: denormalize scan_created_at into findings with backfill migration

* feat: add CryptoHotspotService with 5-dimensional grouping

* feat: add CryptoTrendService with 8 time-bucketed metrics

* feat: add scan-delta service keyed on (name, variant, primitive)

* feat: expose /api/v1/analytics/crypto endpoints (hotspots, trends, scan-delta)

* perf: add indexes supporting analytics aggregation queries

* feat(frontend): add crypto-analytics types and API client

* feat(frontend): add AnalyticsViewSwitcher with URL-synced state

* feat(frontend): add HotspotTable primary view

* feat(frontend): add HotspotHeatmap view

* feat(frontend): add HotspotTreemap and HotspotBarChart views

* feat(frontend): add TrendsTimeSeriesChart with metric label map

* feat(frontend): add ScanDeltaView + ProjectScans delta trigger

* feat(frontend): add CryptoHotspotsPage with view switcher

* feat(frontend): add CryptoTrendsPage and wire sub-tabs into CryptographyTab

* feat(frontend): add CrossProjectNetworkView for team/global scopes

* feat(frontend): add team crypto-analytics sub-tab

* feat(frontend): add admin global crypto analytics + Metabase deep-link

* feat: add MCP tools for crypto hotspots, trends, and scan-delta

* docs: document VITE_METABASE_CRYPTO_DASHBOARD_URL env var

* revert: remove admin + team crypto analytics pages and Metabase references

Crypto analytics stay project-scoped in the Cryptography tab.
The admin /settings/crypto-analytics page, the team-analytics sub-tab in
TeamMembersDialog, and the CrossProjectNetworkView component are removed.
All Metabase deep-link references are stripped (no such dashboard exists).

Backend analytics endpoints for team/global scopes remain — they serve
MCP tool calls and are ready for future UIs if needed.

* refactor: move crypto analytics into main /analytics page

Removes the Cryptography top-level tab from ProjectDetails. All crypto
analysis views (Hotspots, Trends, Inventory, Findings) now live in the
shared /analytics page as a new Cryptography tab, matching the pattern
of VulnerabilityHotspots and other cross-project analyses.

Backend: adds a 'user' scope to ScopeResolver that resolves to all
projects the current user has access to (no elevated permission needed).

Per-project 'Crypto Policy' tab on ProjectDetails is preserved - that
is configuration, not analytics.

* fix: include 'user' in crypto analytics scope regex

* fix(tests): correct $sum accumulator init in _FakeDb aggregate helper

* chore(frontend): remove dead CryptoAssetTable component

* refactor(frontend): simplify crypto tab permission check to analytics:read

* refactor(frontend): drop unused 'network' variant from AnalyticsView

* fix: extend 'user' scope support to HotspotResponse schema and _FakeCollection.find projection arg

* feat: add CRYPTO_KEY_MANAGEMENT FindingType for crypto-misuse SAST rules

* feat: tag crypto-misuse-* SAST rules as CRYPTO_KEY_MANAGEMENT findings

* feat: wire crypto-misuse rules into SAST templates + standalone scan

* feat: add PolicyAuditEntry model + PolicyAuditAction enum

* feat: add compute_change_summary for policy audit diffs

* feat: record_policy_change service + crypto_policy.changed webhook event

* feat: PolicyAuditRepository with insert/list/get/delete-older-than

* feat: write PolicyAuditEntry on every crypto-policy mutation

Hook record_policy_change into PUT /crypto-policies/system,
PUT /projects/{id}/crypto-policy, DELETE /projects/{id}/crypto-policy,
and the seeder. Accept optional comment field on PUT bodies.
Add owner_auth_headers_proj_p2 fixture to conftest.

* feat: policy audit list/detail/revert/prune endpoints

Add policy_audit router with GET/POST/DELETE endpoints for system and
project audit history. Register router in main.py under /api/v1 prefix.

* feat: policy audit retention + startup pruning

Add prune_old_audit_entries() service (honours POLICY_AUDIT_RETENTION_DAYS),
wire PolicyAuditRepository.ensure_indexes() and the retention pruner into
app startup. Add distinct() to _FakeCollection. Fix prune endpoint datetime
query parsing to tolerate URL-encoded '+' timezone offset.

* feat: add compliance reporting schemas + enums

* test: extend fake DB count_documents to support $in and comparison operators

* feat: ComplianceReport model + repository

* feat: compliance framework base + default evaluator + EvaluationInput

* feat: NIST SP 800-131A + BSI TR-02102 + CNSA 2.0 compliance frameworks

* feat: FIPS 140-3 + ISO 19790 algorithm-level compliance frameworks

* feat: ComplianceReportEngine orchestrator (placeholders for renderer pipeline)

* feat: JSON + CSV renderers for compliance reports

* feat: SARIF 2.1.0 renderer for compliance reports

* feat: PDF renderer via WeasyPrint + renderer registry

* feat: engine _gather_inputs + _render + _store_artifact implementations

* feat: compliance report REST endpoints + background generation

* test: format-coverage + expiry integration tests for compliance reports

* feat: PQC migration mappings YAML + loader + schemas

* feat: PQC migration priority scoring (exposure/weakness/deadline/count)

* feat: PQC migration generator + REST endpoint + drift sentinel

* feat: PQC-migration-plan meta-framework for compliance-report export

* feat: four Phase-3 MCP tools (PQC plan, reports list, audit, framework summary)

* feat(frontend): Phase 3 types + API clients for compliance / PQC / audit

* feat(frontend): PQC migration panel + table + detail drawer

* feat(frontend): compliance reports panel with polling + download

* feat(frontend): expose PQC + Compliance tabs in Crypto analytics

* feat(frontend): policy audit timeline + diff view + revert dialog

* feat(frontend): mount policy audit timelines on admin + project pages

* fix(compliance): PQC framework async evaluation to avoid asyncio.run in running loop

* fix(compliance): remove FIPS ECDSA phantom control (never matched findings)

* fix(compliance): CSV evidence_count now sums both findings and asset bom_refs

* test(compliance): assert engine passes real EvaluationInput to framework.evaluate

* feat(compliance): retention sweeper deletes expired reports + GridFS artifacts

* test: consolidate fake-DB range operator implementations ($gte/$lte/$gt/$lt)

* refactor(compliance): remove unused imports in frameworks/base.py

* refactor(pqc): expose clear_mappings_cache() for tests

* fix(frontend): keyboard-accessible rows in migration + compliance tables

* ruff format

* refactor(analytics): extract useAnalyticsView hook to its own module

Resolves react-refresh/only-export-components: the switcher module now
exports only its component + type; the hook moves to a sibling file and
is imported by both the switcher and the parent tab.

* fix(findings): prefer-const + derive scroll target without ref-read during render

Replaces an IIFE that read hasScrolledRef.current while mapping rows
with a pure findIndex up-front. hasScrolledRef is now only consulted
inside useEffect. Also tightens 'let res' to 'const res'.

* fix(project): remove setState-in-effect from AnalyzerSettingsDialog

Drops the reset-on-open useEffect; parent now passes
key={openSettingsAnalyzer} so the dialog remounts and re-initializes
its local state from props — the React-recommended pattern for
resetting state when an identifier changes.

Reference: https://react.dev/learn/you-might-not-need-an-effect

* style: fix ruff findings (unused imports + variables)

Auto-fixes from ruff --fix removed 40 unused imports across app/ and
tests/. Manually removed 4 unused local variables in chat/tools.py
(project_repo, finding_repo, scan_repo, waiver_repo) along with their
now-unused repository imports.

* fix(types): resolve all mypy errors (209 → 0)

Add type annotations across 39 backend files. No behavior change.

Key patterns fixed:
- FastAPI endpoints: typed `db: AsyncIOMotorDatabase` and `current_user: User`
- MongoDB aggregation pipelines annotated as `list[dict[str, Any]]`
- Scope resolver calls cast str → Literal["project","team","global","user"]
- Variable name collisions renamed (latest_rows, hotspots_pipeline)
- weasyprint import: # type: ignore[import-untyped] (no stubs published)
- Framework protocol conformance: ClassVar annotations where needed

Verified: ruff clean, mypy 0 errors, pytest 274 passed (+2 pre-existing
live-Mongo failures unchanged).

* fix(webhooks): expose all 6 webhook events in subscription UI

The webhook subscription dialog only surfaced scan_completed and
vulnerability_found while the backend has been accepting 6 event
types (plus the upcoming pqc_migration_plan.generated). Expand the
static event catalogue to include:
  - analysis_failed
  - crypto_asset.ingested
  - crypto_policy.changed
  - compliance_report.generated
  - pqc_migration_plan.generated

Each entry now carries a user-friendly label and description, and the
checkbox list has a max-height + overflow so the dialog stays compact.

Adds a smoke test that opens the dialog and asserts every event label
renders.

* feat(frontend): delete-report and prune-audit UIs for Phase-3 admin flows

The deleteReport() and pruneSystemAudit()/pruneProjectAudit() API
client functions were previously unreachable from the UI.

ReportDetailDrawer now exposes a destructive "Delete report" button
behind a confirmation dialog. On success it invalidates the
compliance-reports query and closes the drawer; on failure (e.g. 403
for non-owners) the backend error message is surfaced via a toast.

PolicyAuditTimeline grows a "Prune old entries" button in the header
(gated on canRevert, matching the existing admin gate for revert).
The new PruneAuditDialog prompts for a cutoff date (default: 180 days
ago), warns about the destructive nature of the operation, and calls
pruneSystemAudit/pruneProjectAudit. Backend-enforced min-cutoff
errors are rendered verbatim in the toast. Success toast reports the
deleted count.

Adds smoke tests for both flows.

* feat(compliance): permission-gated scope + scope_id in NewReportDialog

The dialog always posted scope="user" so admins could not create
team/project/global reports from the UI. Add a Scope select with
user (default), project, team and — when the caller has
system:manage or analytics:global — global.

For project/team scope, require a non-empty scope_id via an
additional text input; the field validates client-side and the
server-returned error message is surfaced via a toast on backend
validation failures.

Extends the existing smoke test to cover the default user-scope
payload and the permission gating on the Global option.

* fix(crypto-policy): expose all 12 finding-types in policy editor

The finding_type select was limited to crypto_weak_algorithm,
crypto_weak_key and crypto_quantum_vulnerable, but the backend
FindingType enum defines 12 crypto_* values (Phase-2 certificate
lifecycle + Phase-2 protocol weakness + Phase-3 key-management
hygiene).

Extends the CryptoFindingType union with the missing 9 values and
turns FINDING_TYPES into {value, label} entries with user-friendly
labels so the dropdown is readable instead of showing raw enum
strings.

* fix(audit): wire in-app notifications on policy changes

The previous implementation imported `app.services.notifications.service`
as a module and guarded every call with `hasattr`, so the methods on the
`NotificationService` instance were never reachable and notifications
silently never fired. `notify_users_with_permission` also did not exist
at all.

- Add `NotificationService.notify_users_with_permission` that queries
  active users holding any of the required permissions and fans out
  through the existing `notify_users` path.
- Import the module-level `notification_service` singleton correctly
  from `app.services.notifications.service` and drop the hasattr guards.
- System-scope policy changes now notify users holding `system:manage`
  or `analytics:global`; project-scope changes fetch the Project via
  ProjectRepository and call `notify_project_members` with the object.
- Best-effort semantics preserved by the existing try/except around
  `_notify_relevant_users` in `record_policy_change`.

* feat(pqc): fire pqc_migration_plan.generated webhook on plan generation

The Phase-3 webhook spec lists three new events but the endpoint that
serves PQC migration plans never fired the one for it. Now that plans
are first-class outputs of the compliance stack, consumers need a signal
to pick them up.

- Add `WEBHOOK_EVENT_PQC_MIGRATION_PLAN_GENERATED` to constants and
  include it in `WEBHOOK_VALID_EVENTS`.
- Wire a `BackgroundTasks` dispatch in the PQC endpoint so the webhook
  fires after the response is sent; failures are logged but never
  surfaced to the caller (mirrors compliance_reports._run_and_webhook).
- Payload carries scope, scope_id, total_items, status_counts and
  mappings_version so downstream consumers can evaluate relevance
  without refetching the plan.

* fix(compliance): CSV renderer propagates framework disclaimer

FIPS/ISO frameworks set a disclaimer (e.g. "algorithm-level conformance
only; CMVP module validation out of scope") that the PDF/JSON/SARIF
renderers surface but CSV silently dropped. A bare CSV export from a
FIPS report therefore read like a full certification pass.

- Prepend the disclaimer, framework identity and generation timestamp as
  `#`-prefixed comment lines before the header row. Excel, `pandas.read_csv
  (comment='#')` and most SIEM ingesters skip these automatically, so the
  column layout downstream consumers rely on stays intact.
- Add unit tests covering both branches (with disclaimer / without).

* fix(audit): handle pagination boundary in policy diff view

entries[idx + 1] is undefined when idx is the last loaded entry and
the query was capped at 50 rows. The diff view then rendered every
current rule as "added", which is wrong when there are more versions
below the window.

Detect the boundary via (isLast && !previous && entries.length >=
PAGE_SIZE && entry.version > 1) and, in that case, render a
"Previous version is beyond the loaded window" hint alongside a
read-only JSON snapshot of the current entry. Genuine first-version
entries (version === 1 or entries.length < PAGE_SIZE) still get the
regular diff view with previous === undefined handled by
PolicyDiffView as before.

Adds a dedicated test that seeds a 50-entry full page with versions
100..51 and asserts the truncation hint renders when the oldest
entry is expanded.

* fix(normalizer): detect crypto-misuse rules regardless of semgrep path prefix

Semgrep / OpenGrep may emit check_id either as the bare rule name
(`crypto-misuse-ecb-mode-python`) or as a dotted path when rules are
loaded from a filesystem (`rules.crypto-misuse.ecb-mode.crypto-misuse-ecb-
mode-python`). The previous `startswith` against the full string only
caught the first form, so crypto-misuse findings from a path-based
Semgrep invocation silently fell through as generic SAST and never
received the CRYPTO_KEY_MANAGEMENT tag the compliance pipeline expects.

Inspect the final dot-separated segment too, with the original
`startswith` kept as a fast-path. Add regression tests for the nested-
path shape plus negative cases.

* fix(analytics): invalidate cache on policy and waiver mutations

Hotspots, crypto trends and PQC migration plans are all computed on top
of the current crypto policy rules and the per-finding `waived` flag.
The 5-minute TTLCache in `app.services.analytics.cache` had no hook on
either mutation path, so admins saw up-to-five-minutes-stale results
after a rule toggle or waiver approval — the exact windows where
freshness matters most.

- `record_policy_change` now flushes the process-level analytics cache
  after the audit insert (still best-effort; failure never blocks the
  write).
- Waiver create/update/delete endpoints flush the cache synchronously
  before the background stats recalculation fires, closing the hole
  where a waived finding would still show as active in the next
  hotspots request.
- Unit test pins the behaviour by patching `get_analytics_cache` and
  asserting `clear()` fires during `record_policy_change`.

* fix(audit): enforce minimum prune cutoff to preserve forensic history

Before this change a system-manage admin could pass `?before=<yesterday>`
to DELETE /crypto-policies/system/audit and wipe the entire policy
change history in a single request — exactly the kind of action the
audit log exists to catch.

- Reject cutoffs newer than `now - POLICY_AUDIT_MIN_PRUNE_DAYS` with a
  400 and a clear message; default is 90 days of forensic retention,
  configurable via the env var. Invalid / non-positive env values fall
  back to the default so a typo cannot relax the guard.
- Apply the same check on the per-project prune endpoint.

Related fix in the same commit: both `revert_project_policy` and
`prune_project_audit` called `check_project_access(required_role="owner")`,
but "owner" is not in `PROJECT_ROLES = [viewer, editor, admin]`. The
helper raised `ValueError` on every invocation, returning a 500 to
admins legitimately trying to revert or prune. Normalised to
required_role="admin".

* refactor(compliance): tighten EvaluationInput.db type to AsyncIOMotorDatabase

`EvaluationInput.db` was typed `Optional[object]` so the type system
lost all knowledge of what consumers could call on it. The PQC
meta-framework worked around this with a runtime `cast`; other
frameworks that may need DB access in future would have had to do the
same.

Narrow the annotation to `Optional[AsyncIOMotorDatabase[Any]]` so
consumers get proper type-checking. Drop the now-unnecessary `cast`
and the `AsyncIOMotorDatabase` + `typing.cast` imports from
pqc_migration_plan.py. mypy still clean.

* refactor(compliance): replace magic status strings with enum values

`ComplianceReportRepository.count_pending_for_user` hard-coded the
strings `"pending"` and `"generating"` for its Mongo `$in` query.
If `ReportStatus` ever changes value casing or spelling the repo
would silently return zero without any test/type signal. Use
`ReportStatus.PENDING.value` / `.GENERATING.value` instead so the
source of truth stays in the enum.

* refactor: use CustomAPIRouter for all Phase-3 endpoints

Aligns Phase-3 endpoints with the project convention: CustomAPIRouter
sets response_model_by_alias=False so responses serialize 'id' instead
of '_id', matching every other endpoint in the codebase.

* refactor(webhooks): unify event naming to dot-notation with backward-compat aliases

Canonical names: scan.completed, vulnerability.found, analysis.failed.
Legacy snake_case names (scan_completed, vulnerability_found,
analysis_failed) remain accepted via WEBHOOK_EVENT_ALIASES map so
existing subscriptions in MongoDB keep working without a schema
migration. The dispatcher matches events using a $in query against
both canonical and alias forms.

Frontend WebhookManager shows canonical names in the dropdown but
continues to accept and display stored legacy values.

* refactor(ingest): route CBOM-Ingest through ScanManager with full metadata

Align CBOM-Ingest with SBOM-Ingest: the payload now inherits BaseIngest,
accepting all CI metadata fields directly (pipeline_id, commit_hash,
branch, job_id, job_started_at, commit_message, commit_tag, project_url,
pipeline_url, pipeline_iid, project_name, pipeline_user). ScanManager
derives a deterministic scan_id (UUID5 of project+pipeline_id+commit_hash)
so re-submitting the same CI run upserts instead of creating duplicates,
and register_result('cbom', trigger_analysis=True) integrates the scan
into the standard analysis lifecycle.

Backward compatibility: legacy payloads wrapped in
{scan_metadata: {...}, cbom: {...}} still validate — a before-validator
folds scan_metadata.git_ref/commit_sha and friends onto the canonical
BaseIngest fields. The cboms.yml pipeline template already sends the
new flat shape; its metadata was previously being discarded because the
old endpoint only extracted scan_metadata.* keys.

Fixes H1 + H2.

* feat(webhooks): fire sbom.ingested on SBOM ingest

New event symmetric to crypto_asset.ingested. Payload includes scan_id,
project_id, pipeline_id, commit_hash, branch, sboms_processed,
sboms_failed, dependencies_count. Best-effort: webhook failures never
block the ingest response.

Frontend WebhookManager exposes the event for subscription.

Fake-DB gains find_one_and_update so the integration test can exercise
the full ingest -> ScanManager -> register_result -> webhook flow
against the in-process test harness.

* docs(cache): document two-cache architecture and add reset helper

The codebase uses two complementary caches with distinct semantics:
  - app.core.cache.cache_service: async, Redis-backed, cross-pod
    shared. Use for external API responses (OSV, deps.dev, NPM, OIDC)
    where cross-pod dedup matters or upstream rate limits apply.
  - app.services.analytics.cache.TTLCache: sync, in-process, per-pod.
    Use for memoizing MongoDB aggregation output (hotspots, trends,
    PQC plans) where per-pod-per-TTL consistency is sufficient.

Added module-level docstrings on both files explaining when to use
which. Added reset_analytics_cache_for_tests() helper so tests can
patch or re-initialize the in-process singleton cleanly.

Addresses M4 (analytics cache strategies divergent). A full Redis
migration of analytics caching is deferred: the MongoDB aggregations
are cheap enough per-pod that the cross-pod dedup gain does not
justify the async call-site refactor today.

* feat(audit): add policy_type discriminator to PolicyAuditEntry

PolicyAuditEntry gains a policy_type field ('crypto' | 'license',
default 'crypto'). PolicyAuditRepository accepts policy_type in list /
get_by_version / delete_older_than / count.

Backward compatibility: queries for policy_type='crypto' also match
entries written before the field was added (MongoDB $or with
$exists: False). Legacy index (policy_scope, project_id, version) is
kept alongside a new composite index that leads with policy_type.

Fake-DB matcher extended to support $or/$and/$exists/$ne in a
shared helper (consolidating three previously divergent implementations
across _fake_match_doc, _doc_matches_query and _FakeCursor._matches).

* feat(audit): record license-policy changes with change-summary

Add compute_license_policy_change_summary() + record_license_policy_change()
to the audit history service. License changes are persisted in the same
collection as crypto-policy entries (using policy_type='license') and
fire the new license_policy.changed webhook event.

The version for license-policy entries is derived from the count of
existing license-policy entries for the project — license policy has
no explicit version column on the project doc.

Also:
- Generalize _dispatch_webhook + _notify_relevant_users to accept the
  event_type / subject_noun so both policy types share the same
  dispatch code.
- Add backward-compat alias record_crypto_policy_change.
- Extract 'No effective changes' to a module constant.

* feat(projects): audit license-policy changes on project update

PUT /api/v1/projects/{id} now captures the pre-update license policy,
compares it to the post-update state, and — on any effective change —
calls record_license_policy_change() which writes an audit entry with
policy_type='license' and fires the license_policy.changed webhook.

A helper _resolve_license_policy() merges the two shapes the codebase
uses for license policy storage:
  * project.analyzer_settings['license_compliance'] (canonical, Phase 2+)
  * project.license_policy (legacy top-level field)

Audit failures never block the project update (outer try/except + the
recorder is already fail-soft internally).

* feat(projects): list/get license-policy audit entries

New REST endpoints under the existing policy-audit router:
  GET /api/v1/projects/{id}/license-policy/audit
  GET /api/v1/projects/{id}/license-policy/audit/{version}

Both gated on viewer-level project access (reads only). Revert and
prune for license-policy are deferred — revert needs a merge strategy
for project.analyzer_settings that does not stomp peer settings.

* refactor(analytics): unify SBOM + CBOM scope resolution via ScopeResolver

- ScopeResolver._resolve_user now honours PROJECT_READ_ALL (super-user
  escape hatch) so the CBOM-scope semantics match the long-standing
  SBOM-analytics behaviour when callers pass scope='user'.
- get_user_project_ids (helper used across SBOM endpoints) becomes a
  thin shim over ScopeResolver.resolve(scope='user') — a single code
  path now owns permission checking + project-id enumeration for every
  analytics surface.
- generate_pqc_migration_plan (MCP tool) takes a user parameter and
  constructs its ResolvedScope via ScopeResolver instead of hand-
  assembling the dataclass. Matches the pattern every other tool uses.

Closes H3 and L2.

* feat(compliance): add SBOM-side frameworks (License Audit + CVE Remediation SLA)

Two new async-only frameworks that reuse the existing compliance engine,
renderers, retention, and GridFS artifact storage:

  * LicenseAuditFramework — evaluates project SBOM findings against the
    license policy (allow_strong_copyleft / allow_network_copyleft), plus
    a catch-all 'all components have identified licenses' control.
  * CveRemediationSlaFramework — checks that open vulnerabilities are
    fixed within platform SLAs (7 / 30 / 90 days for CRITICAL / HIGH /
    MEDIUM).

Both registered in FRAMEWORK_REGISTRY alongside the crypto frameworks
so the same /api/v1/compliance/reports endpoints / formats / webhooks
apply without any further wiring. Frontend NewReportDialog exposes them
in the Framework dropdown.

* refactor(frontend): extract useAnalyticsList hook for shared list scaffolding

New hook bundles the useQuery + isLoading + isEmpty + error pattern
every analytics panel in the codebase duplicates. HotspotTable is the
first consumer — VulnerabilityHotspots and other larger analytics
components can migrate incrementally in follow-up PRs.

The hook is generic over response + item type so it works with the
crypto HotspotResponse / VulnerabilityHotspot / PQC MigrationItem
shapes without coupling to any of them.

+3 unit tests covering the loading, empty, and error paths.

* refactor(compliance): consolidate duplicated framework helpers

Extract 3 helpers to base.py (public API): status_value,
extract_finding_id, build_summary, build_residual_risks. The license,
cve-remediation and pqc frameworks each had their own near-identical
_build_summary / _residual_risks / status_value / finding-id extraction
— now all four share one implementation.

Net effect: ~60 LOC removed from the 3 SBOM/PQC frameworks; behaviour
unchanged (20/20 framework tests still pass).

* refactor(frontend): extract useDialogState hook

Tiny hook that owns the useState(false) + openDialog/closeDialog/
toggleDialog boilerplate every shadcn Dialog needs. Consuming the hook
cuts 3 lines + one inline callback per Dialog owner.

ComplianceReportsPanel is the first consumer. Other Dialog owners
(WebhookManager, PruneAuditDialog, NewReportDialog, etc.) can migrate
incrementally in follow-up commits.

* refactor(crypto): IANA catalog uses live-fetch + Redis cache pattern

Align with how every other external-data analyzer in the codebase
handles upstream snapshots (OSV, EPSS, GHSA, deps.dev):

  1. In-process memoization for the hot path
  2. Redis cache_service for cross-pod deduplication (7-day TTL)
  3. Live fetch from iana.org with httpx.AsyncClient when Redis is cold
  4. Bundled YAML snapshot as offline fallback (air-gapped / first boot
     before internet is available)

The one-shot backend/scripts/generate_iana_catalog.py is removed — its
CSV-parse + weakness-derivation logic now lives inside the loader and
runs on every registry refresh automatically. The bundled YAML stays
committed as a documented, deterministic fallback (not the canonical
source any more).

protocol_cipher.py now awaits load_iana_catalog() in analyze() instead
of loading sync in __init__; the first analysis call after a cold
deploy hits iana.org and populates the shared Redis entry.

* refactor(webhooks): centralise fire-and-forget pattern via safe_trigger_webhooks

Five callers (cbom_ingest, ingest, pqc_migration, compliance_reports,
audit/history) each wrapped webhook_service.trigger_webhooks in the same
try/except + logger.warning/exception boilerplate. Extract the wrapper
into webhook_service.safe_trigger_webhooks(...) — caller passes a
context= label that gets included in the log message.

Net effect: ~50 LOC removed from the 5 endpoint modules; exception
handling is now uniform (logger.exception, never logger.warning), so
log readers see consistent stack traces for every dispatch failure.

Audit/history keeps an outer try/except because _dispatch_webhook
constructs the payload inline — that step could raise on an unexpected
PolicyAuditEntry shape, independent of the webhook delivery itself.

* refactor(models): extract MongoDocument base class (shared ConfigDict)

Four models (CryptoAsset, CryptoPolicy, PolicyAuditEntry, ComplianceReport)
all carried an identical model_config = ConfigDict(populate_by_name=True,
use_enum_values=True). Extract a MongoDocument base in app.models.types
that owns this config; subclasses inherit and only declare their fields.

Pydantic v2 merges model_config across the inheritance chain, so future
models that need additional config (e.g. extra='allow') can still extend
MongoDocument and add their own settings.

* refactor(frontend): migrate two dialog owners to useDialogState

PolicyAuditTimeline (PruneAuditDialog state) and ReportDetailDrawer
(delete-confirm state) now use the useDialogState hook from commit
97f1752 instead of useState(false) + setX(true)/setX(false)
boilerplate. Same behaviour, ~3 LOC less per consumer.

* refactor(constants): extract SPDX_* identifier constants for license maps

LICENSE_URL_PATTERNS and LICENSE_ALIASES both reference the same SPDX
IDs over and over (Apache-2.0 5\xd7, GPL-2.0 4\xd7, AGPL-3.0 4\xd7, MPL-2.0
4\xd7, LGPL-2.1 3\xd7, GPL-3.0 4\xd7, ...). Extract module-level constants
(SPDX_APACHE_2_0, SPDX_GPL_2_0, ...) so the maps reference symbols
rather than free-floating string literals. Eliminates the typo risk
flagged by SonarLint S1192.

* refactor(frontend): WebhookManager uses useDialogState

* refactor(frontend): split Recommendations.tsx into focused sub-modules

The 1227-LOC monolith is now four files under analytics/recommendations/:

  config.ts             ~245 LOC  priorityConfig + typeConfig + effortConfig
  RecommendationCard.tsx ~696 LOC  per-row rendering
  SummaryCard.tsx        ~157 LOC  summary header
  Recommendations.tsx     ~99 LOC  orchestration + data fetching (kept at
                                  the original path so consumers do not
                                  need to update imports)

Behaviour unchanged. The outer Recommendations.tsx had to remain at
its original path because macOS / Windows have case-insensitive
filesystems and a sibling 'recommendations/' directory cannot coexist
with a 'Recommendations.tsx' file in the same parent. Keeping the
component definition at the outer path side-steps that constraint while
still cleaning up the file size.

* refactor(analytics): split 1554-LOC analytics.py into a package

Old monolith is replaced by 8 focused modules under analytics/:

  __init__.py            22 LOC  aggregate router (re-exports for main.py)
  _shared.py             62 LOC  multi-endpoint helpers (_resolve_scan_id,
                                 _get_enrichment_info, _MSG_ACCESS_DENIED)
  summary.py            200 LOC  /summary, /dependencies/top, /dependency-types
  dependencies.py       259 LOC  /dependency-tree, /component-findings,
                                 /dependency-metadata
  risk.py               326 LOC  /impact, /hotspots
  search.py             459 LOC  /search, /vulnerability-search
  recommendations.py    231 LOC  /projects/{id}/recommendations
  update_frequency.py   155 LOC  /update-frequency, /comparison

All URLs unchanged: app/main.py still imports
'app.api.v1.endpoints.analytics' as before because __init__.py
re-exports the aggregate 'router' under the same name. No call sites
elsewhere in the codebase needed updating.

Endpoint-only helpers stayed co-located with their callers; only the
helpers used by 2+ endpoints moved to _shared.py.

* refactor(license): split 1437-LOC license.py into license_compliance package

Six modules under analyzers/license_compliance/:

  __init__.py        18 LOC  re-exports LicenseAnalyzer, LICENSE_DATABASE
  constants.py      582 LOC  string constants, regex splitters,
                             SEVERITY_RANK, LICENSE_INCOMPATIBILITIES,
                             LICENSE_DATABASE, CATEGORY_STAT_KEY,
                             lazy lowercase-mapping cache
  normalizer.py     161 LOC  normalize_license, extract_licenses,
                             has_spdx_expression, parse_spdx_expression
  compatibility.py  107 LOC  check_pair_conflict, collect_component_licenses,
                             find_license_conflicts, check_license_compatibility
  evaluator.py      424 LOC  per-category evaluate_* (weak/strong/network
                             copyleft), apply_transitive_adjustment,
                             should_include_finding, create_issue
  analyzer.py       315 LOC  LicenseAnalyzer class with orchestration +
                             back-compat wrappers for private methods that
                             tests in tests/test_services/test_analyzers/
                             test_license_analyzer.py call by name

The original analyzers/license.py is now a 10-LOC re-export shim so
existing 'from app.services.analyzers.license import LicenseAnalyzer'
imports keep resolving.

* refactor(chat): split 3022-LOC tools.py into tools package

Five modules under chat/tools/, the largest two split off the
TOOL_DEFINITIONS data block (~1015 LOC) and the dispatcher class:

  __init__.py        113 LOC  re-exports public surface; pre-imports the
                              symbols (ScopeResolver, PQCMigrationPlanGenerator,
                              ComplianceReportRepository, PolicyAuditRepository,
                              ComplianceReportEngine, FRAMEWORK_REGISTRY,
                              ReportFramework, ResolvedScope) so existing
                              patch('app.services.chat.tools.X') calls keep
                              working
  _helpers.py        273 LOC  module-level helpers: _clamp_limit, _clip_value,
                              _serialize_finding_for_llm, severity bucket fns,
                              version compare, URL injection, payload truncation,
                              _serialize_doc
  definitions.py    1038 LOC  pure data: TOOL_DEFINITIONS, TOOL_PERMISSIONS,
                              get_tool_definitions()
  crypto_tools.py    324 LOC  the 12 module-level async tool fns
                              (list_crypto_assets, get_crypto_summary,
                              generate_pqc_migration_plan, list_compliance_reports,
                              list_policy_audit_entries,
                              get_framework_evaluation_summary, ...)
  registry.py       1454 LOC  ChatToolRegistry class (the dispatcher) — kept
                              intact, splitting it would risk regressions in
                              23+ tool branches

crypto_tools.py looks up patched symbols (ScopeResolver etc.) lazily off
the package namespace via a tiny _pkg() helper, so existing test patches
on app.services.chat.tools.X keep targeting the right attribute. The
old tools.py is deleted; the package directory replaces it.

* refactor(aggregation): split 1161-LOC aggregator into focused modules

Break aggregator.py into an aggregation/ sub-package with one
responsibility per file. ResultAggregator stays in aggregator.py and
keeps thin underscore-prefixed wrappers so tests that patch private
methods continue to work.

- components.py: normalize_component, extract_artifact_name
- cross_link.py: cross_link_pair, add_context_to_vulnerability
- merging.py: SAST/vuln/findings merge helpers
- quality.py: update_quality_description
- scorecard.py: enrich_with_scorecard (cache passed in)
- versions.py: parse_version_key, calculate_aggregated_fixed_version

Old aggregator.py becomes a 10-line re-export shim for external
callers.

* chore(backend): trim verbose comments in aggregation package

Drop module docstring boilerplate, comments that restate the next line
of code, and inline 'what' annotations. Keep workaround/why comments
intact.

* chore(frontend): trim verbose comments and dead JSDoc

Drop docstrings that restate function names, inline 'what' comments,
and historical migration notes. Shorten the WebhookManager EVENT_ALIASES
preamble and remove stale lingering-thought TODOs.

* chore(backend): trim verbose comments across services and core

Second pass covering chat tools, license compliance, analytics endpoints,
core utilities, repositories, and other services. Drop docstrings that
restate function names, inline 'what' annotations, section dividers,
and stale migration commentary. Keep workaround/why comments intact.

* fix(waivers): correct global-waiver lookups and stale UI after mutation

- MCP tools queried {global: True} but the Waiver model uses
  project_id=None for global scope, so get_waiver_status,
  list_global_waivers, and get_expiring_waivers never matched any
  global waiver. Switch to project_id=None and add the global branch
  to the expiring-waivers $or filter.
- Drop the {status: $ne expired} filter — only accepted_risk and
  false_positive exist as statuses, expiry is enforced via
  expiration_date alone.
- get_severity_distribution and get_vuln_counts_by_components now
  exclude waived findings, matching the convention used by stats.py
  and the other MCP tools that already filter waived: $ne True.
- Waiver create/update/delete mutations now also invalidate scan,
  analytics, and project query keys so finding lists, severity
  badges, and dashboard counts refresh without manual reload.

* chore(backend): remove dead severity/type count repository methods

get_severity_counts and get_type_counts on FindingRepository have no
callers anywhere in the codebase. They also missed the waived-finding
filter that the live get_severity_distribution now applies, so leaving
them in place would just be a footgun for future callers.

* refactor(recommendation): drop dead try/except in calculate_best_fix_version

Both branches returned the same value, and parse_version_tuple cannot
raise (its int() input comes from a digits-only regex match), so the
guard was unreachable defensive code without a documented threat model.
Inline the sort and let any future regression surface as a real error.

* refactor(frontend): share extractErrorMessage via lib/errors.ts

ReportDetailDrawer and PolicyAuditTimeline both carried byte-identical
copies of the API-error extractor. Extract it once so future error
shapes only need updating in one place.

* refactor(frontend): unify date formatting via formatDate/formatDateTime

Compliance, audit, crypto-analytics, and PQC views all bypassed the
formatDate / formatDateTime helpers in lib/utils.ts and called
toLocaleString / toLocaleDateString inline. Route them through the
helpers so display formatting stays consistent and we have a single
seam for future changes (i18n, fixed locale, invalid-date fallback).

* chore(api): parameterise DatabaseDep with AsyncIOMotorDatabase[Any]

Other definitions in init_db.py already use the parameterised form; the
DatabaseDep alias was the odd one out. Aligning means a future
mypy --strict pass won't have to deal with this single missing type
argument across every endpoint that depends on it.

* perf(compliance): drop unread fields and warn on findings cap hit

_collect_findings loaded every column of up to 20k findings into memory
to evaluate compliance, even though no framework reads description,
scanners, found_in, aliases, or related_findings. Add a projection that
excludes them and surface a warning when the cap is reached so we know
when a scope is silently truncated.

* feat(analytics): warn when user-scope project list is truncated

The user-scope analytics path silently capped accessible projects at
10k. Surface a warning with the user id so we can detect the day a
deployment grows past it instead of producing a quietly-incomplete
analytics view.

* perf(housekeeping): skip never-scanned projects in rescan loop

The rescan scheduler iterated every project every cycle, even though
projects with no last_scan_at can't be rescanned and were dropped on
the first check inside the loop. Filter server-side so the cursor
returns only candidates that could plausibly qualify.
The /signup endpoint accepted UserCreate, which inherited permissions,
is_active, and auth_provider from UserBase and splatted them straight
into the User model. An unauthenticated request could therefore set
arbitrary permissions on the new account.

Introduce a dedicated UserSignup schema that only exposes safe fields
(email, username, password, optional notification metadata) and build
the User explicitly with hardcoded permissions=[], is_active=True,
auth_provider="local".
Pull in the dependabot-style version bumps from main (poetry + pnpm
lock files plus pyproject/package.json constraint relaxations) and
fix three useEffect-based state syncs that the upgraded
eslint-plugin-react-hooks now flags via the
react-hooks/set-state-in-effect rule. The fixes use the
adjust-state-during-render pattern recommended by the React docs
instead of suppressing the rule.
# Conflicts:
#	backend/poetry.lock
#	frontend/package.json
#	frontend/pnpm-lock.yaml
The previous validator used str.startswith against a tuple of allowed
prefixes, which let several attacks slip through:

  * userinfo bypass: http://localhost@evil.com/ matched the localhost
    prefix but resolved to evil.com on delivery
  * suffix bypass: http://localhost.evil.com/ likewise matched
  * any RFC1918, link-local, multicast, reserved or cloud-metadata IP
    literal (incl. 169.254.169.254) was accepted under https://
  * webhook delivery itself never re-checked the resolved IP, so a
    public hostname could rebind to an internal address

Replace the prefix check with a urlparse + ipaddress based validator
that:
  * forces scheme http or https (case-insensitive)
  * rejects empty hostnames and known cloud-metadata DNS names
  * allows plain HTTP only for loopback hosts
  * rejects IP literals in private/loopback/link-local/multicast/
    reserved/unspecified ranges
  * gates loopback targets behind WEBHOOK_ALLOW_LOCALHOST so production
    can disable in-pod delivery entirely

Add assert_safe_webhook_target as a delivery-time guard that resolves
the hostname and rejects if any returned address falls into a blocked
range, mitigating DNS rebinding. Wire it into both _send_with_retries
and test_webhook; treat its ValueError as a non-retriable policy
rejection.
Three fixes that together let the backend pipeline pass:

- Add MongoDB 7 and Redis 7 service containers to the test job. The
  chat-AI assistant tests (test_chat_repository.py,
  test_chat_rate_limiter.py) and the crypto policy seeder tests
  hard-code localhost:27017 / localhost:6379 connections, but the
  workflow had no services declared so they always errored on connect.
- Type _is_blocked_ip / _parse_ip with IPv4Address | IPv6Address
  instead of the internal ipaddress._BaseAddress. The internal class
  doesn't expose is_private / is_loopback / is_link_local / etc., so
  mypy reported attr-defined plus a no-any-return cascade.
- Extend the FindingType enum membership test with the
  CRYPTO_KEY_MANAGEMENT value introduced in phase 3 — the test set
  was hard-coded against the pre-phase-3 enum and now diverges.
The crypto analyzers (crypto_weak_algorithm, crypto_weak_key,
crypto_quantum_vulnerable, crypto_certificate_lifecycle,
crypto_protocol_cipher) emit findings in the canonical Finding shape
but were missing from ResultAggregator's normalizer dispatch. Their
output landed in analysis_results and was silently dropped before
reaching the findings collection — a CBOM scan with weak crypto would
end with findings_count=0 even though the analyzers reported issues.

Add a crypto normalizer that rehydrates each dict into a Finding and
register it for all five crypto analyzer names.
End-to-end testing exposed three connected bugs that together caused
CBOM-only scans to record zero findings even when weak crypto was
present:

1. cbom_ingest never tagged the scan with scan_type="cbom". The
   analysis engine keys on this to force crypto analyzers into the
   active set and to synthesise an empty SBOM pass for CBOM-only
   scans, so without it neither happened.
2. The Scan model didn't declare a scan_type field, so even when the
   ingest path set it on the document, Pydantic stripped it on
   read — getattr(scan_doc, "scan_type", None) was always None.
3. _process_sbom bailed on the synthesised empty {} via
   "if not current_sbom", aborting before the analyzer loop ran.
   Switch to "is None" so empty dicts pass through, and skip the
   SBOM-format scanners (trivy/grype/osv/deps_dev) when no real
   SBOM content was resolved — they would crash on the empty dict
   otherwise.
Reports persist artifact_gridfs_id as a string for JSON-roundtrip
friendliness. Motor's GridFS bucket APIs (open_download_stream,
delete) reject string ids and raise InvalidId / KeyError, which the
download endpoint catches and surfaces as 410 Gone with "Artifact
storage error" — every successful report download was broken.

Wrap the string in ObjectId() at the call sites in the download
endpoint and the retention sweep. Update the format-coverage test
fixture to mint ObjectId-shaped keys so the fake bucket lines up
with production semantics.
The hotspot enrichment grouped findings by 'details.rule_id' and
matched against item.key — but item.key carries the asset dimension
value (e.g. 'MD5', 'RSA', 'algorithm', 'hash') from the asset
aggregation, never a rule id. Every join missed and every hotspot
returned finding_count=0 / severity_mix={}, even when matching
findings clearly existed for the asset.

Pivot the enrichment based on group_by:
- name      -> details.asset_name
- primitive -> details.primitive
- asset_type -> details.asset_type

severity / weakness_tag don't have a clean per-asset path into the
findings collection; leave their counts at zero rather than show
junk.
A non-admin user could call PUT /users/<own-id> and pass
{"permissions": ["system:manage", ...]} or {"is_active": false} in
the body — the endpoint gated only on "caller has user:update OR is
self" and then forwarded the entire UserUpdate payload to the
repository, so any authenticated user could grant themselves arbitrary
permissions.

Three new gates, aligned with the project's fine-grained-permissions
model (no implicit admin role):

- Setting 'permissions' now requires the new
  user:manage_permissions capability — separating routine user
  edits (help-desk admins) from privilege management.
- Even with user:manage_permissions, the caller cannot grant a
  permission they don't already hold themselves (subset rule),
  so a privilege manager can't promote anyone above their own
  ceiling.
- Toggling 'is_active' on yourself is forbidden regardless of held
  permissions, to prevent self-lockout (or last-admin-disabled
  scenarios).

Reproduced the exploit on a running stack with a user holding only
user:read; PUT /users/<self> with {permissions: [system:manage, ...]}
elevated successfully against the unfixed code. After the fix the
same call returns 403 with a clear reason, and the subset rule
returns 403 listing the unauthorised permissions.
The two finding-bound hotspot dimensions both ran the asset-first
pipeline they couldn't satisfy:

- severity grouped crypto_assets by $severity, but crypto_assets
  doesn't carry a severity field — the pipeline always returned an
  empty result.
- weakness_tag mapped to $asset_type by mistake (copy-paste) and
  produced the same output as the asset_type grouping.

Add a separate finding-first aggregation path for these two: filter
crypto findings, group by severity (or by unwound
details.weakness_tags), and report finding_count plus the count of
distinct bom_refs as asset_count. The asset-bound dimensions
(name/primitive/asset_type) keep their existing path and behaviour.

Verified end-to-end against the legacy_crypto_mixed scan:
  group_by=severity     -> 4 HIGH + 1 MEDIUM (matches 5 findings)
  group_by=weakness_tag -> 3 cipher-suite weaknesses surfaced
                           (no-forward-secrecy, weak-cipher-rc4,
                            weak-mac-sha1)
… silently

normalize_crypto previously caught any Finding(**item) validation error
with a bare except: continue, so analyzer output drift (a renamed
field, an unexpected enum value) would silently delete findings from
the scan with zero visibility. Replace with a logger.warning that
includes the offending finding's id and type so operators see the drop
in the backend logs and can root-cause it.
…o findings

Crypto findings (crypto_weak_algorithm, crypto_weak_key,
crypto_quantum_vulnerable, crypto_weak_protocol, crypto_protocol_cipher,
the seven crypto_cert_* lifecycle types, and crypto_key_management)
were never seen by the recommendation engine — its dispatcher had
handlers for vulnerabilities/secrets/sast/iac/licenses/quality but no
crypto handler. A scan with five crypto findings produced
"recommendations: []" with summary["crypto_issues"] missing entirely.

Add app.services.recommendation.crypto.process_crypto, six new
RecommendationType enum values (REPLACE_WEAK_ALGORITHM,
INCREASE_KEY_SIZE, UPGRADE_PROTOCOL, PQC_MIGRATION,
ROTATE_CERTIFICATE, REPLACE_WEAK_CIPHER_SUITE), and dispatcher wiring
in recommendations.py. The handler groups findings by
(finding_type, asset_name) and emits a single recommendation per
group with priority derived from the worst severity, effort tuned per
type (cert rotation = LOW, PQC = HIGH), and per-type suggested
replacements (MD5/SHA-1 -> SHA-256 or SHA-3, DES/3DES/RC4 -> AES-256-GCM,
TLS<1.2 -> TLS 1.2+ AEAD, RSA<2048 -> 3072-bit, quantum-vulnerable
PKE/SIG -> route to /pqc-migration plan).

The endpoint summary now reports a crypto_issues counter and a crypto
key in finding_counts so dashboards stop hiding crypto remediations.

Verified end-to-end against the legacy_crypto_mixed scan: the five
crypto findings (MD5, RSA-1024 weak-key, RSA-1024 quantum-vulnerable,
TLS 1.0 weak-protocol, TLS 1.0 weak-algorithm) produce five
recommendations with the expected types and severities.
list_reports and get_report previously gated only on
get_current_active_user, so any authenticated user could enumerate
every report's metadata across every project, team, and global scope —
including scope_id, framework, and the requester's identity. The
download endpoint already passed each report through ScopeResolver;
the list/get pair leaked everything else.

Add a shared _user_can_see_report helper that runs the same
ScopeResolver(report.scope, report.scope_id) the download path uses.
list_reports filters the returned page in place (the result may shrink
below limit when the user has partial access; pagination of the full
underlying set stays stable). get_report returns 404 instead of 403 on
mismatch so callers can't probe for existence.
Iso19790Framework reused FIPS controls and replaced the prefix on the
ControlDefinition.control_id, but the closure inside
_make_disallowed_evaluator captured the FIPS prefix in its own
ControlResult.control_id assignment. Every disallowed-category result
in an ISO report therefore came back labelled FIPS-140-3-... — wrong
identifiers, broken renderer mapping, and mixed framework IDs in any
downstream consumer that filters by framework prefix.

Extract build_disallowed_algorithm_controls(data, control_id_prefix)
plus a control_id-aware _make_disallowed_evaluator factory in the
FIPS module, and have ISO build its own controls via the same factory
with the ISO-19790 prefix. Live verification: ISO report now emits
ISO-19790-HASH_FUNCTIONS / SYMMETRIC_CIPHERS / ASYMMETRIC /
RSA-MIN-2048.
The /ingest/cbom endpoint accepted bodies of arbitrary size; the
50_000-asset cap only takes effect after parse_cbom has fully
deserialised the payload. An authenticated client could submit a
multi-GB CBOM and force the worker to allocate it.

Add a Depends() that reads Content-Length and rejects with 413 when
above 25 MiB before Pydantic touches the body. Chunked uploads bypass
the header check; that path is bounded by the ASGI server's own
limits.

Verified with a 26 MiB payload: 413 response with the byte counts in
the detail.
…erns

A CryptoRule with quantum_vulnerable=True and no match_name_patterns
matched every PKE/SIGNATURE/KEM asset, including post-quantum
primitives like ML-KEM and ML-DSA — exactly the algorithms the rule
exists to recommend migrating *to*.

Add a model_validator on CryptoRule that requires match_name_patterns
when quantum_vulnerable=True, and drop the now-redundant pattern
re-check in the matcher's quantum_vulnerable branch. The seeded NIST
PQC rule already supplies patterns, so existing deployments are
unaffected; user-authored rules now fail validation up front instead
of silently misfiring.
…tIdentifier

In CycloneDX 1.6 parameterSetIdentifier is a string parameter-set name
("P-256", "ML-KEM-1024", ...), not always a key-size integer. The
parser only treated it as numeric, so any algorithm whose parameter
set isn't a bare integer (every ECC and PQC primitive) had
key_size_bits=None and could never trigger match_min_key_size_bits
rules.

Try int() on parameterSetIdentifier first (covers RSA/AES where the
field is conventionally a bit count), then fall back to common custom
properties (cryptography:key_size, key_size, keySize, ...). Coercion
failures log at debug level so operators can find producers that
emit unparseable values.
ScopeResolver._resolve_user ignores scope_id and only resolves the
caller's own projects, so the previous list/get fix happily resolved
another user's user-scope report — every authenticated caller could
read every user-scope report. The download path has the same gate, so
the artifact was reachable too.

Special-case scope=='user' in _user_can_see_report: only the
requested_by user (or a system:manage holder, kept as the same admin
escape hatch already used by delete_report) sees the report.
Reproduced the leak with two users on the live stack: admin creates a
user-scope report, lowpriv now gets 404 and the report doesn't appear
in lowpriv's list.
int(True) == 1 in Python, so a CBOM with parameterSetIdentifier:true
would silently set key_size_bits=1 and trip every match_min_key_size_bits
rule. Same hazard in the property-fallback path. Negative or zero key
sizes are also nonsense for any rule that triggers on
'asset.key_size_bits < threshold'.

Extract _coerce_positive_int that rejects bools (the isinstance check
must come before int() because bool is a subclass of int) and any
non-positive value. Use it for both parameterSetIdentifier and the
property fallbacks.

Verified: BoolKey, NegKey, ZeroKey -> key_size_bits=None;
GoodKey('2048') -> 2048.
Iso19790Framework reached into Fips1403Framework._data, a private
attribute that could be renamed without warning. Promote it to a
public 'data' cached_property and have ISO read through that. Pure
naming change — no behavioural difference.
_latest_scan_for_project pulled up to 1000 scans per project and
filtered/sorted in memory. The status-bound $in filter and the
created_at sort are both supported by the fake DB used in tests
(verified) and by Motor in production, so they belong in the query.

Replaces a per-project N×1000-doc transfer with a one-doc cursor; the
silent drop of older scans on high-throughput projects is also gone.
Schema rules tighten over time (most recently the
quantum_vulnerable-requires-patterns validator on CryptoRule). A
project override authored against an older schema would propagate
ValidationError out of CryptoPolicyResolver.resolve at scan time —
crashing every analysis run that touches that project until an
operator notices.

Add validate_persisted_policies(db) that walks the crypto_policies
collection at startup and logs a warning per non-validating document
without raising. Hook it after seed_crypto_policies in the startup
event so deployments get a single, time-of-startup signal instead of
a recurring runtime crash.
The previous list_reports fix post-filtered the page after fetching,
which meant pagination shrank silently when a caller had partial
scope access — pages of < limit results with no signal of how many
reports they couldn't see.

Move the scope check into a Mongo $or filter that captures every
scope the caller can see in one go (scope=user iff requested_by ==
caller, scope=project iff scope_id in caller's projects, scope=team
iff in caller's team list, scope=global iff analytics:global or
system:manage). Pass it as ComplianceReportRepository.list's new
extra_filter kwarg so skip/limit run on the already-restricted set
and pages always return up to limit accessible reports.

The fake DB used by integration tests didn't recognise dotted field
paths ('members.user_id'), so ScopeResolver returned no projects in
those tests. Extend the fake _fake_match_doc with a recursive
_resolve_dotted helper that walks lists too — matching real Mongo
semantics — so the visibility branch resolves correctly under tests.
When SHA-1 (or any algorithm flagged by both BSI TR-02102 and NIST
SP 800-131A, etc.) appears in a CBOM, both rules matched the same
asset and the analyzer emitted two separate findings — inflating
findings_count, severity_mix, and dashboard counters by 2x for every
multi-framework hit.

Group rules per asset before building findings: emit one finding per
(asset, finding_type), keep the strictest severity as the lead, and
record every matched rule under details.matched_rules so compliance
evaluators and audit views still see the full cross-framework
agreement. References are deduplicated and merged in the same step.

Verified live: a CBOM containing SHA-1 now produces 1 finding (not 2)
with both bsi-02102-sha1-deprecated and nist-131a-sha1 listed under
matched_rules.
@morzan1001 morzan1001 merged commit 7f8890e into main Apr 29, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant